Work:
Education:
Teaching:
Certifications:
LinkedIn: https://www.linkedin.com/in/daniel-martin-sheehan/
This Notebook uses https://rise.readthedocs.io/en/stable/ - use jupyter notebook to use, not lab :/
import warnings
warnings.filterwarnings('ignore')
# See README.md for all libraries and steps to create the environment.
import pandas as pd
import geopandas as gpd
from shapely import wkt
import matplotlib.pyplot as plt
import rasterio
from rasterstats import zonal_stats
A data model is a way of defining and representing real world surfaces and characteristics in GIS.
There are two primary types of spatial data models: Vector and Raster.
Source: http://gsp.humboldt.edu/OLM/Courses/GSP_216_Online/lesson3-1/data-models.html
Vector data represents features as discrete points, lines, and polygons
Raster data represents features as a rectangular matrix of square cells (pixels)

Source: http://www.geo.umass.edu/courses/geo494a/Chapter2_GIS_Fundamentals.pdf
A triangulated irregular network (TIN)
is a data model commonly used to represent terrain heights - Source: http://www.geo.umass.edu/courses/geo494a/Chapter2_GIS_Fundamentals.pdf
Typical use cases are for terrain inluts in Hydrologic Modeling / Watershed Modeling.

Source: http://www.geo.umass.edu/courses/geo494a/Chapter2_GIS_Fundamentals.pdf
Vector data is very common, and is often used to represent features like roads and boundaries. Vector data comes in the form of points and lines that are geometrically and mathematically associated.
- Points:
* One pair of coordinates defines the location of a point feature.
- Polylines (LineStrings):
* Two or more pairs of coordinates that are connected define a line feature.
* A series of connected points
- Polygons:
* Multiple pairs of coordinates that are connected and closed define a polygon feature.
* A series of connected points that loop back to the first point
* Multiple "polygons" can exist in one layer
* Polygons can have internal polygons or "holes"
* The beginning and ending coordinates for a polygon are the same.
Source: http://gsp.humboldt.edu/OLM/Courses/GSP_216_Online/lesson3-1/data-models.html
One dependency (another Python Library GeoPandas depends on) is Shapely
Shapely is a Python package for set-theoretic analysis and manipulation of planar features using (via Python’s ctypes module) functions from the well known and widely deployed GEOS library. GEOS, a port of the Java Topology Suite (JTS), is the geometry engine of the PostGIS spatial extension for the PostgreSQL RDBMS. The designs of JTS and GEOS are largely guided by the Open Geospatial Consortium’s Simple Features Access Specification 1 and Shapely adheres mainly to the same set of standard classes and operations. Shapely is thereby deeply rooted in the conventions of the geographic information systems (GIS) world, but aspires to be equally useful to programmers working on non-conventional problems.
Source: https://shapely.readthedocs.io/en/latest/manual.html
from shapely.geometry import Point
point = Point(
0.0,
0.0)
point
from shapely.geometry import LineString
line = LineString([(0, 0), (2, 2)])
line
a = LineString([(0, 0), (1, 1), (1, 2), (2, 2)])
b = LineString([(0, 0), (1, 1), (2, 1), (2, 2)])
a
b
x = a.intersection(b)
x
from shapely.geometry import Polygon
c = Polygon([[1, 1], [-1, 3], [3, 3], [3, 1]])
d = Polygon([[0, 0], [0, 4], [4, 4], [4, 1]])
c
d
x = c.intersection(d)
x
from shapely.geometry import MultiPoint
e = MultiPoint([(0, 0), (1, 1), (1, -1)])
e
e.convex_hull # minimum bounding geometry
pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Source: https://pandas.pydata.org/pandas-docs/stable/index.html
If totally new to Pandas, you can think of Pandas as the table (.dbf) of a Shapefile or as a Sheet in Excel or Google Spreadsheet.
DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object. Like Series, DataFrame accepts many different kinds of input. Source: https://pandas.pydata.org/pandas-docs/stable/getting_started/dsintro.html
csv_url = 'https://data.cityofnewyork.us/api/views/he7q-3hwy/rows.csv?accessType=DOWNLOAD'
df = pd.read_csv(csv_url)
# show a random sample of rows from the .csv data read in to the DataFrame object
df.sample(10)
| OBJECTID | URL | NAME | the_geom | LINE | |
|---|---|---|---|---|---|
| 86 | 1820 | http://web.mta.info/nyct/service/ | 33rd St & Queens Blvd at SW corner | POINT (-73.9317939996055 40.74476300048575) | 7 |
| 1277 | 1133 | http://web.mta.info/nyct/service/ | Broadway & 86th St at SE corner | POINT (-73.97628800043398 40.78807300127442) | 1 |
| 1488 | 1344 | http://web.mta.info/nyct/service/ | Lorimer St & Broadway at SW corner | POINT (-73.9466129996701 40.70332100121268) | J-M |
| 1831 | 1687 | http://web.mta.info/nyct/service/ | NaN | POINT (-73.99493799970026 40.725137000715904) | B-D-F-M-6 |
| 293 | 149 | http://web.mta.info/nyct/service/ | 7th Ave & 53rd St at NE corner | POINT (-73.98157800036606 40.762937001104106) | B-D-E |
| 1746 | 1602 | http://web.mta.info/nyct/service/ | 4th Ave & 45th St at NW corner | POINT (-74.00960200017478 40.64954600111841) | R |
| 556 | 412 | http://web.mta.info/nyct/service/ | Church St & Chambers St at NE corner | POINT (-74.00773899971288 40.71499200122226) | A-C-E-2-3 |
| 438 | 294 | http://web.mta.info/nyct/service/ | 14th Ave & 61st St at NW corner | POINT (-73.99687000021943 40.627110001144935) | D-N |
| 425 | 281 | http://web.mta.info/nyct/service/ | East 16 St & Avenue M at NW corner | POINT (-73.95946000002303 40.618140000630724) | B-Q |
| 1149 | 1005 | http://web.mta.info/nyct/service/ | 6th Ave & 49th St at SE corner | POINT (-73.98078800025752 40.75899600118979) | B-D-F-M-7 |
# show the first 15 lines using .head()
df.head(15)
| OBJECTID | URL | NAME | the_geom | LINE | |
|---|---|---|---|---|---|
| 0 | 1734 | http://web.mta.info/nyct/service/ | Birchall Ave & Sagamore St at NW corner | POINT (-73.86835600032798 40.84916900104506) | 2-5 |
| 1 | 1735 | http://web.mta.info/nyct/service/ | Birchall Ave & Sagamore St at NE corner | POINT (-73.86821300022677 40.84912800131844) | 2-5 |
| 2 | 1736 | http://web.mta.info/nyct/service/ | Morris Park Ave & 180th St at NW corner | POINT (-73.87349900050798 40.84122300105249) | 2-5 |
| 3 | 1737 | http://web.mta.info/nyct/service/ | Morris Park Ave & 180th St at NW corner | POINT (-73.8728919997833 40.84145300067447) | 2-5 |
| 4 | 1738 | http://web.mta.info/nyct/service/ | Boston Rd & 178th St at SW corner | POINT (-73.87962300013866 40.84081500075867) | 2-5 |
| 5 | 1739 | http://web.mta.info/nyct/service/ | Boston Rd & E Tremont Ave at NW corner | POINT (-73.88000500027815 40.840434000875874) | 2-5 |
| 6 | 1740 | http://web.mta.info/nyct/service/ | Boston Rd & E Tremont Ave at NE corner | POINT (-73.87983300021861 40.84035400111976) | 2-5 |
| 7 | 1741 | http://web.mta.info/nyct/service/ | Boston Rd & 178th St at SE corner | POINT (-73.8795549998336 40.84063900116436) | 2-5 |
| 8 | 1742 | http://web.mta.info/nyct/service/ | Boston Rd & 178th St at NW corner | POINT (-73.87939700013239 40.84107800066419) | 2-5 |
| 9 | 1743 | http://web.mta.info/nyct/service/ | Boston Rd & 174th St at SW corner | POINT (-73.88804799985908 40.83732500129732) | 2-5 |
| 10 | 1744 | http://web.mta.info/nyct/service/ | Boston Rd & 174th St at NW corner | POINT (-73.88775499967073 40.83759900080287) | 2-5 |
| 11 | 1745 | http://web.mta.info/nyct/service/ | Southern Blvd & Freeman St at SW corner | POINT (-73.8920200000541 40.82990300129103) | 2-5 |
| 12 | 1746 | http://web.mta.info/nyct/service/ | Southern Blvd & Freeman St at NW corner | POINT (-73.89201499957316 40.83036900118696) | 2-5 |
| 13 | 1747 | http://web.mta.info/nyct/service/ | Westchester Ave & Simpson St at NE corner | POINT (-73.8928969996593 40.824356000986995) | 2-5 |
| 14 | 1748 | http://web.mta.info/nyct/service/ | Westchester Ave & Simpson St at NW corner | POINT (-73.89311400018327 40.82417700086825) | 2-5 |
df.shape
(1928, 5)
df.value_counts('LINE')
LINE
F 125
1 120
6 113
2-5 92
A 75
...
A-H 1
A-FS 1
A-C-J-L 1
A-C-FS 1
e-F-M-R 1
Length: 97, dtype: int64
from shapely import wkt
gdf = gpd.GeoDataFrame(
df, geometry=df['the_geom'].apply(wkt.loads),
crs="EPSG:4326")
gdf.plot();
gdf.crs
<Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World. - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
boro_geojson_link = 'https://data.cityofnewyork.us/api/geospatial/tqmj-j8zm?method=export&format=GeoJSON'
# read in geojson as geopandas GeoDataFrame
gdf = gpd.read_file(boro_geojson_link)
gdf.plot(figsize=(5, 5)); # nyc boros
gdf.head()
| boro_code | boro_name | shape_area | shape_leng | geometry | |
|---|---|---|---|---|---|
| 0 | 5 | Staten Island | 1623620725.06 | 325917.353702 | MULTIPOLYGON (((-74.05051 40.56642, -74.05047 ... |
| 1 | 2 | Bronx | 1187174784.85 | 463179.772813 | MULTIPOLYGON (((-73.89681 40.79581, -73.89694 ... |
| 2 | 1 | Manhattan | 636520830.768 | 357564.316391 | MULTIPOLYGON (((-74.01093 40.68449, -74.01193 ... |
| 3 | 3 | Brooklyn | 1934143372.64 | 728197.541089 | MULTIPOLYGON (((-73.86327 40.58388, -73.86381 ... |
| 4 | 4 | Queens | 3041418506.76 | 888199.731579 | MULTIPOLYGON (((-73.82645 40.59053, -73.82642 ... |
# what is the coordinate reference system?
gdf.crs
<Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World. - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
inplace=True¶gdf.to_crs(epsg=2263, inplace=True)
# changed the state of the projection systems
gdf.crs
<Derived Projected CRS: EPSG:2263> Name: NAD83 / New York Long Island (ftUS) Axis Info [cartesian]: - X[east]: Easting (US survey foot) - Y[north]: Northing (US survey foot) Area of Use: - name: United States (USA) - New York - counties of Bronx; Kings; Nassau; New York; Queens; Richmond; Suffolk. - bounds: (-74.26, 40.47, -71.8, 41.3) Coordinate Operation: - name: SPCS83 New York Long Island zone (US Survey feet) - method: Lambert Conic Conformal (2SP) Datum: North American Datum 1983 - Ellipsoid: GRS 1980 - Prime Meridian: Greenwich
Very broadly, Geoprocessing is any operation involving geospatial data or methods. The Geographic Information Systems (GIS) software company Esri refers to it as "Computing with geographic data."
It is commonly used interchangeably with the term Spatial Analysis.
Professor Jochen Albrecht defines Geoprocessing as;
...any GIS operation used to manipulate data. A typical geoprocessing operation takes an input dataset, performs an operation on that dataset, and returns the result of the operation as an output dataset, also referred to as derived data.
Common geoprocessing operations are geographic feature overlay, feature selection and analysis, topology processing, and data conversion.
Geoprocessing allows you to define, manage, and analyze geographic information used to make decisions. - Jochen Albrecht
There are several Geoprocessing libraries and technologies. A few notable open source ones are;
SELECT
ST_Intersects(a.the_geom, b.the_geom) AS the_geom
FROM
a, b
ArcPy is a Python site package that provides a useful and productive way to perform geographic data analysis, data conversion, data management, and map automation with Python. Esri
Arcpy is for ArcGIS Desktop or ArcGIS Pro. The ArcGIS API for Python, using ArcGIS Online is also available for multiple platforms.
arcpy.analysis.Intersect([feature_1, feature_2], out_feature)
Returns a GeoSeries of geometries representing all points within a given distance of each geometric object. Source: https://geopandas.org/geometric_manipulations.html
fig, ax = plt.subplots(figsize=(6, 6))
gdf.plot(ax=ax)
gdf.buffer(5000).plot(ax=ax, facecolor="none", alpha=0.7, edgecolor='red', linewidth=3)
plt.title('Buffer');
Returns a GeoSeries of points for each geometric centroid. Source: https://geopandas.org/geometric_manipulations.html#GeoSeries.boundary
In mathematics and physics, the centroid or geometric center of a plane figure is the arithmetic mean position of all the points in the figure. Informally, it is the point at which a cutout of the shape could be perfectly balanced on the tip of a pin.[1]
The definition extends to any object in n-dimensional space: its centroid is the mean position of all the points in all of the coordinate directions.[2] Source: https://en.wikipedia.org/wiki/Centroid
fig, ax = plt.subplots(figsize=(7, 7))
gdf.plot(ax=ax)
gdf.centroid.plot(ax=ax, facecolor="red", markersize=200)
plt.title('Centroid');
Returns a GeoSeries of geometries representing the smallest convex Polygon containing all the points in each object unless the number of points in the object is less than three. For two points, the convex hull collapses to a LineString; for 1, a Point. Source: https://geopandas.org/geometric_manipulations.html#GeoSeries.boundary
In geometry, the convex hull or convex envelope or convex closure of a shape is the smallest convex set that contains it. For a bounded subset of the plane, the convex hull may be visualized as the shape enclosed by a rubber band stretched around the subset. Source: https://en.wikipedia.org/wiki/Convex_hull
fig, ax = plt.subplots(figsize=(6, 6))
gdf.plot(ax=ax)
gdf.convex_hull.plot(ax=ax, facecolor="none", alpha=0.7, edgecolor='red', linewidth=3)
plt.title('Convex Hull');
Returns a GeoSeries of geometries representing the point or smallest rectangular polygon (with sides parallel to the coordinate axes) that contains each object. Source: https://geopandas.org/geometric_manipulations.html#GeoSeries.boundary
fig, ax = plt.subplots(figsize=(7, 7))
gdf.plot(
ax=ax,
)
gdf.envelope.plot(ax=ax, facecolor="none", alpha=0.7,edgecolor='red', linewidth=3)
plt.title('Envelope');
Returns a GeoSeries containing a simplified representation of each object. Source: https://geopandas.org/geometric_manipulations.html#GeoSeries.boundary
fig, ax = plt.subplots(figsize=(5, 5))
gdf.plot(ax=ax)
gdf.simplify(10000).plot(ax=ax, facecolor="none", alpha=0.7, edgecolor='red', linewidth=3)
plt.title('Simplify');
Add GeoPandas Spatial Join Examples
When working with multiple spatial datasets – especially multiple polygon or line datasets – users often wish to create new shapes based on places where those datasets overlap (or don’t overlap). These manipulations are often referred using the language of sets – intersections, unions, and differences. These types of operations are made available in the geopandas library through the overlay function.
The basic idea is demonstrated by the graphic below but keep in mind that overlays operate at the DataFrame level, not on individual geometries, and the properties from both are retained. In effect, for every shape in the first GeoDataFrame, this operation is executed against every other shape in the other GeoDataFrame. Source: https://geopandas.org/set_operations.html
Source: https://geopandas.org/set_operations.html
The Dimensionally Extended nine-Intersection Model (DE-9IM) is a topological model and a standard used to describe the spatial relations of two regions (two geometries in two-dimensions, R2), in geometry, point-set topology, geospatial topology, and fields related to computer spatial analysis. The spatial relations expressed by the model are invariant to rotation, translation and scaling transformations. Source: https://en.wikipedia.org/wiki/DE-9IM
from IPython.display import IFrame
IFrame(src='https://en.wikipedia.org/wiki/DE-9IM', width=1000, height=600)
import geopandas
from shapely.geometry import Polygon
polys1 = geopandas.GeoSeries([
Polygon([(0,0), (2,0), (2,2), (0,2)]),
Polygon([(2,2), (4,2), (4,4), (2,4)])])
polys2 = geopandas.GeoSeries([
Polygon([(1,1), (3,1), (3,3), (1,3)]),
Polygon([(3,3), (5,3), (5,5), (3,5)])])
df1 = geopandas.GeoDataFrame({'geometry': polys1, 'df1':[1,2]})
df2 = geopandas.GeoDataFrame({'geometry': polys2, 'df2':[1,2]})
ax = df1.plot(color='red', figsize=(7, 7));
df2.plot(ax=ax, color='green', alpha=0.5);
The output feature class will contain polygons representing the geometric union of all the inputs as well as all the fields from all the input feature classes. See below for examples of how attribute values are assigned to the output features. Source: https://desktop.arcgis.com/en/arcmap/10.3/tools/analysis-toolbox/how-union-analysis-works.htm

res_union = geopandas.overlay(
df1,
df2,
how='union')
res_union
| df1 | df2 | geometry | |
|---|---|---|---|
| 0 | 1.0 | 1.0 | POLYGON ((2.00000 2.00000, 2.00000 1.00000, 1.... |
| 1 | 2.0 | 1.0 | POLYGON ((2.00000 2.00000, 2.00000 3.00000, 3.... |
| 2 | 2.0 | 2.0 | POLYGON ((4.00000 4.00000, 4.00000 3.00000, 3.... |
| 3 | 1.0 | NaN | POLYGON ((2.00000 0.00000, 0.00000 0.00000, 0.... |
| 4 | 2.0 | NaN | MULTIPOLYGON (((3.00000 3.00000, 4.00000 3.000... |
| 5 | NaN | 1.0 | MULTIPOLYGON (((2.00000 2.00000, 3.00000 2.000... |
| 6 | NaN | 2.0 | POLYGON ((3.00000 5.00000, 5.00000 5.00000, 5.... |
ax = res_union.plot(alpha=0.5, cmap='tab10', figsize=(7, 7))
df1.plot(ax=ax, facecolor='none', edgecolor='k');
df2.plot(ax=ax, facecolor='none', edgecolor='k');
The Intersect tool calculates the geometric intersection of any number of feature classes and feature layers. The features, or portion of features, that are common to all inputs (that is, they intersect) will be written to the output feature class. Source: https://desktop.arcgis.com/en/arcmap/10.3/tools/analysis-toolbox/how-intersect-analysis-works.htm
Diagrams below from: https://desktop.arcgis.com/en/arcmap/10.3/tools/analysis-toolbox/how-intersect-analysis-works.htm


Diagrams above from: https://desktop.arcgis.com/en/arcmap/10.3/tools/analysis-toolbox/how-intersect-analysis-works.htm
res_intersection = geopandas.overlay(
df1,
df2,
how='intersection')
res_intersection
| df1 | df2 | geometry | |
|---|---|---|---|
| 0 | 1 | 1 | POLYGON ((2.00000 2.00000, 2.00000 1.00000, 1.... |
| 1 | 2 | 1 | POLYGON ((2.00000 2.00000, 2.00000 3.00000, 3.... |
| 2 | 2 | 2 | POLYGON ((4.00000 4.00000, 4.00000 3.00000, 3.... |
ax = res_intersection.plot(cmap='tab10', figsize=(6, 6))
df1.plot(ax=ax, facecolor='none', edgecolor='k');
df2.plot(ax=ax, facecolor='none', edgecolor='k');
Features or portions of features in the input and update features that do not overlap will be written to the output feature class. Source: https://desktop.arcgis.com/en/arcmap/10.3/tools/analysis-toolbox/symmetrical-difference.htm

res_symdiff = geopandas.overlay(
df1,
df2,
how='symmetric_difference')
res_symdiff
| df1 | df2 | geometry | |
|---|---|---|---|
| 0 | 1.0 | NaN | POLYGON ((2.00000 0.00000, 0.00000 0.00000, 0.... |
| 1 | 2.0 | NaN | MULTIPOLYGON (((3.00000 3.00000, 4.00000 3.000... |
| 2 | NaN | 1.0 | MULTIPOLYGON (((2.00000 2.00000, 3.00000 2.000... |
| 3 | NaN | 2.0 | POLYGON ((3.00000 5.00000, 5.00000 5.00000, 5.... |
ax = res_symdiff.plot(cmap='tab10', figsize=(6, 6))
df1.plot(ax=ax, facecolor='none', edgecolor='k');
df2.plot(ax=ax, facecolor='none', edgecolor='k');
Seems most similar to Esri's Erase overlay function.
Creates a feature class by overlaying the Input Features with the polygons of the Erase Features. Only those portions of the input features falling outside the erase features outside boundaries are copied to the output feature class. Source: https://desktop.arcgis.com/en/arcmap/10.3/tools/analysis-toolbox/erase.htm

res_difference = geopandas.overlay(
df1,
df2,
how='difference')
res_difference
| geometry | df1 | |
|---|---|---|
| 0 | POLYGON ((2.00000 0.00000, 0.00000 0.00000, 0.... | 1 |
| 1 | MULTIPOLYGON (((3.00000 3.00000, 4.00000 3.000... | 2 |
ax = res_difference.plot(cmap='tab10', figsize=(6, 6))
df1.plot(ax=ax, facecolor='none', edgecolor='k')
df2.plot(ax=ax, facecolor='none', edgecolor='k');
countries = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
countries.plot(cmap='tab10', figsize=(10, 10));
countries = countries[['continent', 'geometry']]
continents = countries.dissolve(by='continent')
continents.plot(cmap='tab10', figsize=(10, 10));
countries = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
countries.head(14)
| pop_est | continent | name | iso_a3 | gdp_md_est | geometry | |
|---|---|---|---|---|---|---|
| 0 | 889953.0 | Oceania | Fiji | FJI | 5496 | MULTIPOLYGON (((180.00000 -16.06713, 180.00000... |
| 1 | 58005463.0 | Africa | Tanzania | TZA | 63177 | POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... |
| 2 | 603253.0 | Africa | W. Sahara | ESH | 907 | POLYGON ((-8.66559 27.65643, -8.66512 27.58948... |
| 3 | 37589262.0 | North America | Canada | CAN | 1736425 | MULTIPOLYGON (((-122.84000 49.00000, -122.9742... |
| 4 | 328239523.0 | North America | United States of America | USA | 21433226 | MULTIPOLYGON (((-122.84000 49.00000, -120.0000... |
| 5 | 18513930.0 | Asia | Kazakhstan | KAZ | 181665 | POLYGON ((87.35997 49.21498, 86.59878 48.54918... |
| 6 | 33580650.0 | Asia | Uzbekistan | UZB | 57921 | POLYGON ((55.96819 41.30864, 55.92892 44.99586... |
| 7 | 8776109.0 | Oceania | Papua New Guinea | PNG | 24829 | MULTIPOLYGON (((141.00021 -2.60015, 142.73525 ... |
| 8 | 270625568.0 | Asia | Indonesia | IDN | 1119190 | MULTIPOLYGON (((141.00021 -2.60015, 141.01706 ... |
| 9 | 44938712.0 | South America | Argentina | ARG | 445445 | MULTIPOLYGON (((-68.63401 -52.63637, -68.25000... |
| 10 | 18952038.0 | South America | Chile | CHL | 282318 | MULTIPOLYGON (((-68.63401 -52.63637, -68.63335... |
| 11 | 86790567.0 | Africa | Dem. Rep. Congo | COD | 50400 | POLYGON ((29.34000 -4.49998, 29.51999 -5.41998... |
| 12 | 10192317.3 | Africa | Somalia | SOM | 4719 | POLYGON ((41.58513 -1.68325, 40.99300 -0.85829... |
| 13 | 52573973.0 | Africa | Kenya | KEN | 95503 | POLYGON ((39.20222 -4.67677, 37.76690 -3.67712... |
continents = countries.dissolve(
by='continent',
aggfunc='sum')
continents
| geometry | pop_est | gdp_md_est | |
|---|---|---|---|
| continent | |||
| Africa | MULTIPOLYGON (((-11.43878 6.78592, -11.70819 6... | 1.306370e+09 | 2455514 |
| Antarctica | MULTIPOLYGON (((-61.13898 -79.98137, -60.61012... | 4.490000e+03 | 898 |
| Asia | MULTIPOLYGON (((48.67923 14.00320, 48.23895 13... | 4.550277e+09 | 32725478 |
| Europe | MULTIPOLYGON (((-53.55484 2.33490, -53.77852 2... | 7.454125e+08 | 21587850 |
| North America | MULTIPOLYGON (((-155.22217 19.23972, -155.5421... | 5.837560e+08 | 25075988 |
| Oceania | MULTIPOLYGON (((147.91405 -43.21152, 147.56456... | 4.120487e+07 | 1647113 |
| Seven seas (open ocean) | POLYGON ((68.93500 -48.62500, 69.58000 -48.940... | 1.400000e+02 | 16 |
| South America | MULTIPOLYGON (((-68.63999 -55.58002, -69.23210... | 4.270667e+08 | 3852015 |
continents.plot(
column = 'pop_est',
scheme='quantiles',
cmap='YlOrRd',
figsize=(10, 10),
);
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
cities = geopandas.read_file(geopandas.datasets.get_path('naturalearth_cities'))
countries = world[['geometry', 'name', 'pop_est', 'gdp_md_est']]
countries = countries.rename(columns={'name':'country'})
cities.plot(color='red');
countries.plot();
countries.head()
| geometry | country | pop_est | gdp_md_est | |
|---|---|---|---|---|
| 0 | MULTIPOLYGON (((180.00000 -16.06713, 180.00000... | Fiji | 889953.0 | 5496 |
| 1 | POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... | Tanzania | 58005463.0 | 63177 |
| 2 | POLYGON ((-8.66559 27.65643, -8.66512 27.58948... | W. Sahara | 603253.0 | 907 |
| 3 | MULTIPOLYGON (((-122.84000 49.00000, -122.9742... | Canada | 37589262.0 | 1736425 |
| 4 | MULTIPOLYGON (((-122.84000 49.00000, -120.0000... | United States of America | 328239523.0 | 21433226 |
cities.head()
| name | geometry | |
|---|---|---|
| 0 | Vatican City | POINT (12.45339 41.90328) |
| 1 | San Marino | POINT (12.44177 43.93610) |
| 2 | Vaduz | POINT (9.51667 47.13372) |
| 3 | Lobamba | POINT (31.20000 -26.46667) |
| 4 | Luxembourg | POINT (6.13000 49.61166) |
cities_with_country = geopandas.sjoin(
cities,
countries,
how="inner",
op='intersects')
cities_with_country.head(20)
| name | geometry | index_right | country | pop_est | gdp_md_est | |
|---|---|---|---|---|---|---|
| 0 | Vatican City | POINT (12.45339 41.90328) | 141 | Italy | 60297396.0 | 2003576 |
| 1 | San Marino | POINT (12.44177 43.93610) | 141 | Italy | 60297396.0 | 2003576 |
| 226 | Rome | POINT (12.48131 41.89790) | 141 | Italy | 60297396.0 | 2003576 |
| 2 | Vaduz | POINT (9.51667 47.13372) | 114 | Austria | 8877067.0 | 445075 |
| 212 | Vienna | POINT (16.36469 48.20196) | 114 | Austria | 8877067.0 | 445075 |
| 3 | Lobamba | POINT (31.20000 -26.46667) | 73 | eSwatini | 1148130.0 | 4471 |
| 16 | Mbabane | POINT (31.13333 -26.31665) | 73 | eSwatini | 1148130.0 | 4471 |
| 4 | Luxembourg | POINT (6.13000 49.61166) | 128 | Luxembourg | 619896.0 | 71104 |
| 9 | Bir Lehlou | POINT (-9.65252 26.11917) | 2 | W. Sahara | 603253.0 | 907 |
| 10 | Monaco | POINT (7.40691 43.73965) | 43 | France | 67059887.0 | 2715518 |
| 13 | Andorra | POINT (1.52659 42.51075) | 43 | France | 67059887.0 | 2715518 |
| 186 | Geneva | POINT (6.14003 46.21001) | 43 | France | 67059887.0 | 2715518 |
| 235 | Paris | POINT (2.35299 48.85809) | 43 | France | 67059887.0 | 2715518 |
| 14 | Port-of-Spain | POINT (-61.51703 10.65200) | 175 | Trinidad and Tobago | 1394973.0 | 24269 |
| 15 | Kigali | POINT (30.05859 -1.95164) | 169 | Rwanda | 12626950.0 | 10354 |
| 17 | Juba | POINT (31.58003 4.82998) | 176 | S. Sudan | 11062113.0 | 11998 |
| 18 | The Hague | POINT (4.26996 52.08004) | 130 | Netherlands | 17332850.0 | 907050 |
| 192 | Amsterdam | POINT (4.91469 52.35191) | 130 | Netherlands | 17332850.0 | 907050 |
| 19 | Ljubljana | POINT (14.51497 46.05529) | 150 | Slovenia | 2087946.0 | 54174 |
| 20 | Bratislava | POINT (17.11698 48.15002) | 152 | Slovakia | 5454073.0 | 105079 |
cities_with_country.plot(color='red');
fig, ax = plt.subplots(figsize=(10, 4))
countries.plot(ax=ax, facecolor='gray')
cities_with_country.plot(
ax=ax,
column='gdp_md_est',
scheme='quantiles',
legend=True,
cmap='Greens',
edgecolor='white',
alpha=0.6,
markersize=cities_with_country['gdp_md_est']/5000.,
legend_kwds={
"title": 'GDP',
"loc": 4})
plt.title('Gross Domestic Product (GDP) - 2016', {'fontsize': 27});
plt.savefig('img/gdp.png')
[Rasters] are frequently represented as a continuous grid of square cells, each containing a value indicating the (estimated) average height or strength of the field in that cell. In most of the literature and within software packages the points/ lines/ areas model is described as vector data, whilst the grid model is described as raster (or image) data. Source: https://www.spatialanalysisonline.com/HTML/index.html
Raster Basics¶
A raster is a rectangular grid of pixels with values that can continuous (e.g. elevation) or categorical (e.g. land use). This data structure is very common - jpg images on the web, photos from your digital camera, and the jupyterhub icon are all rasters. Source: https://github.com/geohackweek/raster-2019/blob/master/notebooks/0-introduction.ipynb
Common properties of any raster:
number of rows and columns (sometimes referred to as lines and samples) data type (dtype, or bit depth) - e.g., 8-bit (2^8 possible values, 0-255) some kind of resolution information, often dots per inch (dpi) with raster graphics A geospatial raster is only different from a digital photo in that it is accompanied by metadata that connects the grid to a particular location. Source: https://github.com/geohackweek/raster-2019/blob/master/notebooks/0-introduction.ipynb
Source: National Ecological Observatory Network (NEON) via: https://github.com/geohackweek/raster-2019/blob/master/notebooks/0-introduction.ipynb
Examples of categorical rasters¶
Some rasters contain categorical data where each pixel represents a discrete class such as a landcover type (e.g., "coniferous forest" or "grassland") rather than a continuous value such as elevation or temperature. Some examples of classified maps include:
The following map shows elevation data for the NEON Harvard Forest field site. In this map, the elevation data (a continuous variable) has been divided up into categories to yield a categorical raster. Source: https://github.com/geohackweek/raster-2019/blob/master/notebooks/0-introduction.ipynb
Source: Homer, C.G., et al., 2015, Completion of the 2011 National Land Cover Database for the conterminous United States-Representing a decade of land cover change information. Photogrammetric Engineering and Remote Sensing, v. 81, no. 5, p. 345-354) via https://datacarpentry.org/organization-geospatial/01-intro-raster-data/index.html
resolution of a raster represents the area on the ground that each pixel of the raster covers. The image below illustrates the effect of changes in resolution. Source: https://datacarpentry.org/organization-geospatial/01-intro-raster-data/index.html
Source: National Ecological Observatory Network (NEON) via https://datacarpentry.org/organization-geospatial/01-intro-raster-data/index.html
A raster can contain one or more bands. One type of multi-band raster dataset that is familiar to many of us is a color image. A basic color image consists of three bands: red, green, and blue. Each band represents light reflected from the red, green or blue portions of the electromagnetic spectrum. The pixel brightness for each band, when composited creates the colors that we see in an image. Source: https://datacarpentry.org/organization-geospatial/01-intro-raster-data/index.html
Source: National Ecological Observatory Network (NEON) via https://datacarpentry.org/organization-geospatial/01-intro-raster-data/index.html
import ipyleaflet # source: https://github.com/geohackweek/raster-2019/blob/master/notebooks/3-visualization-and-modis.ipynb
from ipyleaflet import Map, Rectangle, basemaps, basemap_to_tiles, TileLayer, SplitMapControl, Polygon
import ipywidgets
import datetime
import re
bbox = [43.16, -11.32, 43.54, -11.96]
west, north, east, south = bbox
bbox_ctr = [0.5*(north+south), 0.5*(west+east)]
m = Map(center=bbox_ctr, zoom=5) # MODIS great for large areas (onl)
right_layer = basemap_to_tiles(basemaps.NASAGIBS.ModisTerraTrueColorCR, "2020-04-01")
left_layer = TileLayer()
control = SplitMapControl(left_layer=left_layer, right_layer=right_layer)
m.add_control(control)
m
Map(center=[-11.64, 43.349999999999994], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_t…
import numpy as np
import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
# let's allow our dataframe printouts in jupyter to be larger
simple_image = np.load('data/image_file.npy')
# .npy is a numpy file storage format
simple_image
array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 11, 104, 159, 159, 232, 195, 102, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 19, 154, 227, 254, 235, 174, 167, 233, 184, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 106, 217, 254, 239, 159, 23, 0, 0, 68, 221, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
72, 249, 251, 131, 11, 0, 0, 53, 13, 2, 40, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 26,
243, 250, 72, 0, 0, 5, 184, 251, 103, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 152,
254, 170, 0, 0, 0, 148, 254, 254, 85, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 225,
254, 41, 0, 30, 198, 253, 254, 88, 2, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 231,
254, 67, 22, 181, 251, 246, 108, 20, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 37, 241,
254, 146, 240, 254, 238, 62, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 41, 226,
254, 254, 254, 192, 21, 3, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 91, 229,
254, 246, 120, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 56, 248, 254,
254, 190, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 92, 249, 229, 131,
213, 254, 136, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 69, 233, 248, 65, 0,
178, 254, 143, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 9, 73, 252, 248, 88, 0, 0,
84, 249, 180, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 44, 254, 246, 50, 0, 0, 25,
210, 247, 86, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 25, 213, 254, 84, 0, 2, 82, 223,
254, 203, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 63, 254, 254, 4, 33, 168, 254, 254,
176, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 31, 225, 254, 227, 240, 254, 235, 133,
9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 42, 193, 254, 254, 222, 29, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0]], dtype=uint8)
pd.DataFrame(simple_image).head(28)
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 11 | 104 | 159 | 159 | 232 | 195 | 102 | 0 | 0 | 0 | 0 |
| 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 19 | 154 | 227 | 254 | 235 | 174 | 167 | 233 | 184 | 0 | 0 | 0 | 0 |
| 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 106 | 217 | 254 | 239 | 159 | 23 | 0 | 0 | 68 | 221 | 0 | 0 | 0 | 0 |
| 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 72 | 249 | 251 | 131 | 11 | 0 | 0 | 53 | 13 | 2 | 40 | 0 | 0 | 0 | 0 |
| 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 26 | 243 | 250 | 72 | 0 | 0 | 5 | 184 | 251 | 103 | 0 | 0 | 0 | 0 | 0 | 0 |
| 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 152 | 254 | 170 | 0 | 0 | 0 | 148 | 254 | 254 | 85 | 0 | 0 | 0 | 0 | 0 | 0 |
| 10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 21 | 225 | 254 | 41 | 0 | 30 | 198 | 253 | 254 | 88 | 2 | 0 | 0 | 0 | 0 | 0 | 0 |
| 11 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 25 | 231 | 254 | 67 | 22 | 181 | 251 | 246 | 108 | 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 12 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 37 | 241 | 254 | 146 | 240 | 254 | 238 | 62 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 13 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 41 | 226 | 254 | 254 | 254 | 192 | 21 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 14 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 91 | 229 | 254 | 246 | 120 | 16 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 15 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 56 | 248 | 254 | 254 | 190 | 15 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 16 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 92 | 249 | 229 | 131 | 213 | 254 | 136 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 69 | 233 | 248 | 65 | 0 | 178 | 254 | 143 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 18 | 0 | 0 | 0 | 0 | 0 | 0 | 9 | 73 | 252 | 248 | 88 | 0 | 0 | 84 | 249 | 180 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 19 | 0 | 0 | 0 | 0 | 0 | 0 | 44 | 254 | 246 | 50 | 0 | 0 | 25 | 210 | 247 | 86 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 20 | 0 | 0 | 0 | 0 | 0 | 25 | 213 | 254 | 84 | 0 | 2 | 82 | 223 | 254 | 203 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 21 | 0 | 0 | 0 | 0 | 0 | 63 | 254 | 254 | 4 | 33 | 168 | 254 | 254 | 176 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 22 | 0 | 0 | 0 | 0 | 0 | 31 | 225 | 254 | 227 | 240 | 254 | 235 | 133 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 23 | 0 | 0 | 0 | 0 | 0 | 0 | 42 | 193 | 254 | 254 | 222 | 29 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 26 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 27 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
import matplotlib.pyplot as plt
plt.imshow(simple_image, cmap='gray')
plt.show();
Bands 2, 3, and 4 are visible blue, green, and red. Source: https://landsat.gsfc.nasa.gov/landsat-8/landsat-8-bands/
Much of this is referenced from: https://automating-gis-processes.github.io/CSC/lessons/L5/overview.html
import urllib
# https://automating-gis-processes.github.io/CSC/notebooks/L5/reading-raster.html
raster_url = 'https://github.com/Automating-GIS-processes/CSC18/raw/master/data/Helsinki_masked_p188r018_7t20020529_z34__LV-FIN.tif'
helsinki_local_file = 'data/raster/Helsinki_masked_p188r018_7t20020529_z34__LV-FIN.tif'
urllib.request.urlretrieve(raster_url, helsinki_local_file)
('data/raster/Helsinki_masked_p188r018_7t20020529_z34__LV-FIN.tif',
<http.client.HTTPMessage at 0x16a7d0fa0>)
Rasterio: access to geospatial raster data
Geographic information systems use GeoTIFF and other formats to organize and store gridded raster datasets such as satellite imagery and terrain models. Rasterio reads and writes these formats and provides a Python API based on Numpy N-dimensional arrays and GeoJSON.
Source: https://rasterio.readthedocs.io/en/latest/index.html
import rasterio
dataset = rasterio.open(helsinki_local_file)
dataset
<open DatasetReader name='data/raster/Helsinki_masked_p188r018_7t20020529_z34__LV-FIN.tif' mode='r'>
plt.figure(figsize=(8, 8))
plt.imshow(dataset.read(1), cmap='Blues')
plt.title('Band 1 - Blue');
dataset.count
# number of raster bands
7
from matplotlib import pyplot
from rasterio.plot import show
fig, (axr, axg, axb) = pyplot.subplots(1,3, figsize=(21,7))
show((dataset, 3), ax=axr, cmap='Reds', title='red channel')
show((dataset, 2), ax=axg, cmap='Greens', title='green channel')
show((dataset, 1), ax=axb, cmap='Blues', title='blue channel')
pyplot.show()
from rasterio.plot import show_hist
show_hist(
dataset, bins=50, lw=0.0, stacked=False, alpha=0.3,
histtype='stepfilled', title="Histogram of Bands")
dataset.bounds
BoundingBox(left=698592.0, bottom=6656859.0, right=735300.0, top=6697870.5)
'X - ', dataset.bounds[0], dataset.bounds[2], '-', 'Y - ', dataset.bounds[1], dataset.bounds[3]
('X - ', 698592.0, 735300.0, '-', 'Y - ', 6656859.0, 6697870.5)
np.mean([dataset.bounds[0], dataset.bounds[2]]), np.mean([dataset.bounds[1], dataset.bounds[3]])
(716946.0, 6677364.75)

Source: https://desktop.arcgis.com/en/arcmap/10.7/tools/spatial-analyst-toolbox/cell-statistics.htm
sample_coordinates = [
(716946.0, 6677364.75),
(713046.0, 6676164.75),
(714846.0, 6666164.75),
(720846.0, 6668164.75),
]
samples = [x for x in dataset.sample(sample_coordinates)]
samples
[array([ 65, 48, 34, 44, 41, 128, 27], dtype=uint8), array([ 85, 70, 69, 49, 78, 125, 61], dtype=uint8), array([ 55, 34, 25, 10, 10, 105, 11], dtype=uint8), array([ 59, 34, 28, 11, 11, 104, 10], dtype=uint8)]

Source: https://gisgeography.com/zonal-statistics/
Zonal statistics summarizes the cells for a given region or line.
from shapely.geometry import Polygon
Polygon(sample_coordinates)
from shapely.geometry import mapping, Polygon
import fiona
poly = Polygon(sample_coordinates)
schema = {'geometry': 'Polygon'}
with fiona.open('data/sample_coordinates.shp', 'w', 'ESRI Shapefile', schema) as c:
c.write({'geometry': mapping(poly)})
# save a shapefile with fiona
import geopandas as gpd
fig, ax = plt.subplots(figsize=(10, 10))
p = gpd.GeoSeries(poly)
p.plot(ax=ax, facecolor='none', edgecolor='red')
rasterio.plot.show(dataset, ax=ax)
plt.show()
from rasterstats import zonal_stats
stats = zonal_stats('data/sample_coordinates.shp', helsinki_local_file, band=1, stats=['min', 'max', 'median', 'majority', 'sum'])
stats
[{'min': 27.0,
'max': 252.0,
'sum': 4118148.0,
'median': 61.0,
'majority': 60.0}]
stats = zonal_stats('data/sample_coordinates.shp', helsinki_local_file, band=6, stats=['min', 'max', 'median', 'majority', 'sum'])
stats
[{'min': 103.0,
'max': 160.0,
'sum': 6997422.0,
'median': 106.0,
'majority': 105.0}]
import folium
m = folium.Map(
location=[45.372, -121.6972],
zoom_start=3,
tiles='Stamen Terrain')
folium.Marker(
[63.0690, -151.0063],
popup='<i>Denali</i>',
).add_to(m)
folium.Marker(
[36.5786, -118.2920],
popup='<b>Mount Whitney</b>',
).add_to(m)
m